A Framework for Transparent Execution of Massively-Parallel Applications on CUDA and OpenCL
نویسندگان
چکیده
We present a novel framework for the simultaneous development for different massively parallel platforms. Currently, our framework supports CUDA and OpenCL but it can be easily adapted to other programming languages. The main idea is to provide an easy-to-use abstraction layer that encapsulates the calls of own parallel device code as well as library functions. With our framework the code has to be written only once and can then be used transparently for CUDA and OpenCL. The output is a single binary file and the application can decide during run-time which particular GPU-method it will use. This enables us to support new features of specific platforms while maintaining compatibility. We have applied our framework to a typical project using CUDA and ported it easily to OpenCL. Furthermore we present a comparison of the running times of the ported library on the different supported platforms.
منابع مشابه
7 A Scalable Software Framework for Stateful Stream Data Processing on Multiple GPUs and Applications
During the past few years the increase of computational power has been realized using more processors with multiple cores and specific processing units like Graphics Processing Units (GPUs). Also, the introduction of programming languages such as CUDA and OpenCL makes it easy, even for non-graphics programmers, to exploit the computational power of massively parallel processors available in cur...
متن کاملTechnical Report WM - CS - 2010 - 03 College of William & Mary Department of Computer Science WM - CS - 2010 - 03 Implementing the Dslash Operator in OpenCL
The Dslash operator is used in Lattice Quantum Chromodymamics (LQCD) applications to implement a Wilson-Dirac sparse matrix-vector product. Typically the Dslash operation has been implemented as a parallel program. Today’s Graphics Processing Units (GPU) are designed to do highly parallel numerical calculations for 3D graphics rendering. This design works well with scientific applications such ...
متن کاملThe Design and Implementation Ocelot’s Dynamic Binary Translator from PTX to Multi-Core x86
Ocelot is a dynamic compilation framework designed to map the explicitly parallel PTX execution model used by NVIDIA CUDA applications onto diverse many-core architectures. Ocelot includes a dynamic binary translator from PTX to many-core processors that leverages the LLVM code generator to target x86. The binary translator is able to execute CUDA applications without recompilation and Ocelot c...
متن کاملOn the Complexity of Robust Source-to-Source Translation from CUDA to OpenCL
The use of hardware accelerators in high-performance computing has grown increasingly prevalent, particularly due to the growth of graphics processing units (GPUs) as generalpurpose (GPGPU) accelerators. Much of this growth has been driven by NVIDIA’s CUDA ecosystem for developing GPGPU applications on NVIDIA hardware. However, with the increasing diversity of GPUs (including those from AMD, AR...
متن کاملHigh-Level Programming of Stencil Computations on Multi-GPU Systems Using the SkelCL Library
The implementation of stencil computations on modern, massively parallel systems with GPUs and other accelerators currently relies on manually-tuned coding using low-level approaches like OpenCL and CUDA. This makes development of stencil applications a complex, time-consuming, and error-prone task. We describe how stencil computations can be programmed in our SkelCL approach that combines high...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2015